Goto

Collaborating Authors

 unstructured environment


Heterogeneous Robot Collaboration in Unstructured Environments with Grounded Generative Intelligence

Ravichandran, Zachary, Cladera, Fernando, Prabhu, Ankit, Hughes, Jason, Murali, Varun, Taylor, Camillo, Pappas, George J., Kumar, Vijay

arXiv.org Artificial Intelligence

Heterogeneous robot teams operating in realistic settings often must accomplish complex missions requiring collaboration and adaptation to information acquired online. Because robot teams frequently operate in unstructured environments -- uncertain, open-world settings without prior maps -- subtasks must be grounded in robot capabilities and the physical world. While heterogeneous teams have typically been designed for fixed specifications, generative intelligence opens the possibility of teams that can accomplish a wide range of missions described in natural language. However, current large language model (LLM)-enabled teaming methods typically assume well-structured and known environments, limiting deployment in unstructured environments. We present SPINE-HT, a framework that addresses these limitations by grounding the reasoning abilities of LLMs in the context of a heterogeneous robot team through a three-stage process. Given language specifications describing mission goals and team capabilities, an LLM generates grounded subtasks which are validated for feasibility. Subtasks are then assigned to robots based on capabilities such as traversability or perception and refined given feedback collected during online operation. In simulation experiments with closed-loop perception and control, our framework achieves nearly twice the success rate compared to prior LLM-enabled heterogeneous teaming approaches. In real-world experiments with a Clearpath Jackal, a Clearpath Husky, a Boston Dynamics Spot, and a high-altitude UAV, our method achieves an 87\% success rate in missions requiring reasoning about robot capabilities and refining subtasks with online feedback. More information is provided at https://zacravichandran.github.io/SPINE-HT.


BiFlex: A Passive Bimodal Stiffness Flexible Wrist for Manipulation in Unstructured Environments

Jeong, Gu-Cheol, Gasperina, Stefano Dalla, Deshpande, Ashish D., Chin, Lillian, Martín-Martín, Roberto

arXiv.org Artificial Intelligence

-- Robotic manipulation in unstructured, human-centric environments poses a dual challenge: achieving the precision need for delicate free-space operation while ensuring safety during unexpected contact events. Traditional wrists struggle to balance these demands, often relying on complex control schemes or complicated mechanical designs to mitigate potential damage from force overload. In response, we present BiFlex, a flexible robotic wrist that uses a soft buckling honeycomb structure to provides a natural bimodal stiffness response. The higher stiffness mode enables precise household object manipulation, while the lower stiffness mode provides the compliance needed to adapt to external forces. We design BiFlex to maintain a fingertip deflection of less than 1 cm while supporting loads up to 500g and create a BiFlex wrist for many grippers, including Panda, Robotiq, and BaRiFlex. We demonstrate that BiFlex simplifies control while maintaining precise object manipulation and enhanced safety in real-world applications. Designing robots capable of physical tasks in unstructured environments remains one of the core open problems of modern robotics. Unstructured settings are characterized by their inherent uncertainty that exposes robotic end-effectors to frequent and unpredictable forces. For example, when grasping a flat object or wiping a surface, inaccuracies in the perceived location could lead to the robot missing the target, or creating unexpected and dangerously high reactive forces that could damage the robot.


Humanoid Agent via Embodied Chain-of-Action Reasoning with Multimodal Foundation Models for Zero-Shot Loco-Manipulation

Wen, Congcong, Bethala, Geeta Chandra Raju, Hao, Yu, Pudasaini, Niraj, Huang, Hao, Yuan, Shuaihang, Huang, Baoru, Nguyen, Anh, Wang, Mengyu, Tzes, Anthony, Fang, Yi

arXiv.org Artificial Intelligence

Humanoid loco-manipulation, which integrates whole-body locomotion with dexterous manipulation, remains a fundamental challenge in robotics. Beyond whole-body coordination and balance, a central difficulty lies in understanding human instructions and translating them into coherent sequences of embodied actions. Recent advances in foundation models provide transferable multimodal representations and reasoning capabilities, yet existing efforts remain largely restricted to either locomotion or manipulation in isolation, with limited applicability to humanoid settings. In this paper, we propose Humanoid-COA, the first humanoid agent framework that integrates foundation model reasoning with an Embodied Chain-of-Action (CoA) mechanism for zero-shot loco-manipulation. Within the perception--reasoning--action paradigm, our key contribution lies in the reasoning stage, where the proposed CoA mechanism decomposes high-level human instructions into structured sequences of locomotion and manipulation primitives through affordance analysis, spatial inference, and whole-body action reasoning. Extensive experiments on two humanoid robots, Unitree H1-2 and G1, in both an open test area and an apartment environment, demonstrate that our framework substantially outperforms prior baselines across manipulation, locomotion, and loco-manipulation tasks, achieving robust generalization to long-horizon and unstructured scenarios. Project page: https://humanoid-coa.github.io/


Transferring Vision-Language-Action Models to Industry Applications: Architectures, Performance, and Challenges

Li, Shuai, Yizhe, Chen, Dong, Li, Sichao, Liu, Dapeng, Lan, Yu, Liu, Pang, Zhibo

arXiv.org Artificial Intelligence

The application of artificial intelligence (AI) in industry is accelerating the shift from traditional automation to intelligent systems with perception and cognition. Vision language-action (VLA) models have been a key paradigm in AI to unify perception, reasoning, and control. Has the performance of the VLA models met the industrial requirements? In this paper, from the perspective of industrial deployment, we compare the performance of existing state-of-the-art VLA models in industrial scenarios and analyze the limitations of VLA models for real-world industrial deployment from the perspectives of data collection and model architecture. The results show that the VLA models retain their ability to perform simple grasping tasks even in industrial settings after fine-tuning. However, there is much room for performance improvement in complex industrial environments, diverse object categories, and high precision placing tasks. Our findings provide practical insight into the adaptability of VLA models for industrial use and highlight the need for task-specific enhancements to improve their robustness, generalization, and precision.


Online Adaptation of Terrain-Aware Dynamics for Planning in Unstructured Environments

Ward, William, Etter, Sarah, Ingebrand, Tyler, Ellis, Christian, Thorpe, Adam J., Topcu, Ufuk

arXiv.org Artificial Intelligence

--Autonomous mobile robots operating in remote, unstructured environments must adapt to new, unpredictable terrains that can change rapidly during operation. In such scenarios, a critical challenge becomes estimating the robot's dynamics on changing terrain in order to enable reliable, accurate navigation and planning. We present a novel online adaptation approach for terrain-aware dynamics modeling and planning using function encoders. By learning a set of neural network basis functions that span the robot dynamics on diverse terrains, we enable rapid online adaptation to new, unseen terrains and environments as a simple least-squares calculation. We demonstrate our approach for terrain adaptation in a Unity-based robotics simulator and show that the downstream controller has better empirical performance due to higher accuracy of the learned model. This leads to fewer collisions with obstacles while navigating in cluttered environments as compared to a neural ODE baseline. Rapid adaptation to unknown environments and terrain is critical for autonomous mobile robots. In off-road navigation, unpredictable terrain features such as rocky paths, forest floors, and wet fields can cause skidding, tripping, or immobilization, jeopardizing the robot's ability to reach its objective. Autonomous ground vehicles must therefore dynamically adjust their behavior to terrain-specific conditions. This adaptation is challenging because terrain variations directly alter system dynamics. For example, tire response to acceleration depends on surface friction.


LOVON: Legged Open-Vocabulary Object Navigator

Peng, Daojie, Cao, Jiahang, Zhang, Qiang, Ma, Jun

arXiv.org Artificial Intelligence

--Object navigation in open-world environments remains a formidable and pervasive challenge for robotic systems, particularly when it comes to executing long-horizon tasks that require both open-world object detection and high-level task planning. Traditional methods often struggle to integrate these components effectively, and this limits their capability to deal with complex, long-range navigation missions. In this paper, we propose LOVON, a novel framework that integrates large language models (LLMs) for hierarchical task planning with open-vocabulary visual detection models, tailored for effective long-range object navigation in dynamic, unstructured environments. T o tackle real-world challenges including visual jittering, blind zones, and temporary target loss, we design dedicated solutions such as Laplacian V ariance Filtering for visual stabilization. We also develop a functional execution logic for the robot that guarantees LOVON's capabilities in autonomous navigation, task adaptation, and robust task completion. Extensive evaluations demonstrate the successful completion of long-sequence tasks involving real-time detection, search, and navigation toward open-vocabulary dynamic targets. In recent years, large language models (LLMs) [1] and vision models [2]-[5] have achieved revolutionary breakthroughs in the field of artificial intelligence.


Gaussian Splatting as a Unified Representation for Autonomy in Unstructured Environments

Ong, Dexter, Tao, Yuezhan, Murali, Varun, Spasojevic, Igor, Kumar, Vijay, Chaudhari, Pratik

arXiv.org Artificial Intelligence

--In this work, we argue that Gaussian splatting is a suitable unified representation for autonomous robot navigation in large-scale unstructured outdoor environments. Such environments require representations that can capture complex structures while remaining computationally tractable for real-time navigation. We demonstrate that the dense geometric and photometric information provided by a Gaussian splatting representation is useful for navigation in unstructured environments. Additionally, semantic information can be embedded in the Gaussian map to enable large-scale task-driven navigation. From the lessons learned through our experiments, we highlight several challenges and opportunities arising from the use of such a representation for robot autonomy. In environments such as those in Figure 1, traditional approaches often struggle to capture the complexity and variability of the scene, presenting challenges for autonomous navigation under such conditions. These capabilities are crucial for applications such as precision agriculture [1], forestry [2], search-and-rescue [3] and infrastructure inspection [4]. To address this, we present Gaussian splatting as a versatile representation for large-scale autonomy in unstructured outdoor environments.


Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities

Ravichandran, Zachary, Cladera, Fernando, Hughes, Jason, Murali, Varun, Hsieh, M. Ani, Pappas, George J., Taylor, Camillo J., Kumar, Vijay

arXiv.org Artificial Intelligence

The integration of foundation models (FMs) into robotics has enabled robots to understand natural language and reason about the semantics in their environments. However, existing FM-enabled robots primary operate in closed-world settings, where the robot is given a full prior map or has a full view of its workspace. This paper addresses the deployment of FM-enabled robots in the field, where missions often require a robot to operate in large-scale and unstructured environments. To effectively accomplish these missions, robots must actively explore their environments, navigate obstacle-cluttered terrain, handle unexpected sensor inputs, and operate with compute constraints. We discuss recent deployments of SPINE, our LLM-enabled autonomy framework, in field robotic settings. To the best of our knowledge, we present the first demonstration of large-scale LLM-enabled robot planning in unstructured environments with several kilometers of missions. SPINE is agnostic to a particular LLM, which allows us to distill small language models capable of running onboard size, weight and power (SWaP) limited platforms. Via preliminary model distillation work, we then present the first language-driven UAV planner using on-device language models. We conclude our paper by proposing several promising directions for future research.


DRPA-MPPI: Dynamic Repulsive Potential Augmented MPPI for Reactive Navigation in Unstructured Environments

Fuke, Takahiro, Endo, Masafumi, Honda, Kohei, Ishigami, Genya

arXiv.org Artificial Intelligence

Reactive mobile robot navigation in unstructured environments is challenging when robots encounter unexpected obstacles that invalidate previously planned trajectories. Model predictive path integral control (MPPI) enables reactive planning, but still suffers from limited prediction horizons that lead to local minima traps near obstacles. Current solutions rely on heuristic cost design or scenario-specific pre-training, which often limits their adaptability to new environments. We introduce dynamic repulsive potential augmented MPPI (DRPA-MPPI), which dynamically detects potential entrapments on the predicted trajectories. Upon detecting local minima, DRPA-MPPI automatically switches between standard goal-oriented optimization and a modified cost function that generates repulsive forces away from local minima. Comprehensive testing in simulated obstacle-rich environments confirms DRPA-MPPI's superior navigation performance and safety compared to conventional methods with less computational burden.


Bench2FreeAD: A Benchmark for Vision-based End-to-end Navigation in Unstructured Robotic Environments

Peng, Yuhang, Wang, Sidong, Yang, Jihaoyu, Li, Shilong, Wang, Han, Gong, Jiangtao

arXiv.org Artificial Intelligence

Most current end-to-end (E2E) autonomous driving algorithms are built on standard vehicles in structured transportation scenarios, lacking exploration of robot navigation for unstructured scenarios such as auxiliary roads, campus roads, and indoor settings. This paper investigates E2E robot navigation in unstructured road environments. First, we introduce two data collection pipelines - one for real-world robot data and another for synthetic data generated using the Isaac Sim simulator, which together produce an unstructured robotics navigation dataset -- FreeWorld Dataset. Second, we fine-tuned an efficient E2E autonomous driving model -- VAD -- using our datasets to validate the performance and adaptability of E2E autonomous driving models in these environments. Results demonstrate that fine-tuning through our datasets significantly enhances the navigation potential of E2E autonomous driving models in unstructured robotic environments. Thus, this paper presents the first dataset targeting E2E robot navigation tasks in unstructured scenarios, and provides a benchmark based on vision-based E2E autonomous driving algorithms to facilitate the development of E2E navigation technology for logistics and service robots. The project is available on Github.